HAQWA: a Hash-based and Query Workload Aware Distributed RDF Store

نویسندگان

Olivier Curé

Hubert Naacke

Mohamed Amine Baazizi

Bernd Amann

چکیده

Like most data models encountered in the Big Data ecosystem, RDF stores are managing large data sets by partitioning triples across a cluster of machines. Nevertheless, the graphical nature of RDF data as well as its associated SPARQL query execution model makes the efficient data distribution more involved than in other data models, e.g., relational. In this paper, we propose a novel system that is characterized by a trade-off between complexity of data partitioning and efficiency of query answering in cases where a query workload is known. The prototype is implemented over the Apache Spark framework, ensuring high availability, fault tolerance and scalability. This short paper presents the main features of the system and highlights the omnipresence of parallel computation across data fragmentation and allocation, encoding and query processing tasks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scaling Queries over Big RDF Graphs with Semantic Hash Partitioning

Massive volumes of big RDF data are growing beyond the performance capacity of conventional RDF data management systems operating on a single node. Applications using large RDF data demand efficient data partitioning solutions for supporting RDF data access on a cluster of compute nodes. In this paper we present a novel semantic hash partitioning approach and implement a Semantic HAsh Partition...

متن کامل

Storage Balancing in P2P Based Distributed RDF Data Stores

Centralized RDF repositories have been designed to support RDF data storage and retrieval. However, they suffer from the traditional limitations of centralized approaches which are scalability and fault tolerance. Peer to Peer (P2P) networks can provide the scalability, fault-tolerance and robustness, features that the current solutions to local RDF storage do not provide which are needed by th...

متن کامل

Evaluating SPARQL Queries on Massive RDF Datasets

Distributed RDF systems partition data across multiple computer nodes. Partitioning is typically based on heuristics that minimize inter-node communication and it is performed in an initial, data pre-processing phase. Therefore, the resulting partitions are static and do not adapt to changes in the query workload; as a result, existing systems are unable to consistently avoid communication for ...

متن کامل

Adaptive Partitioning for Very Large RDF Data

State-of-the-art distributed RDF systems partition data across multiple computer nodes (workers). Some systems perform cheap hash partitioning, which may result in expensive query evaluation, while others apply heuristics aiming at minimizing inter-node communication during query evaluation. This requires an expensive data pre-processing phase, leading to high startup costs for very large RDF k...

متن کامل

Accessing XML Documents Using Semantic Meta Data in a P2P Environment

XGR (XML Data Grid) and BabelPeers are both data management systems based on distributed hash tables (DHT) that use the Pastry DHT to store data and meta data. XGR is based on the XML data model; BabelPeers uses the Resource Description Framework (RDF) for its data. XGR and BabelPeers have different but complementary functionality. On the one hand, XGR focuses on document-based storage of XML d...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

HAQWA: a Hash-based and Query Workload Aware Distributed RDF Store

نویسندگان

چکیده

منابع مشابه

Scaling Queries over Big RDF Graphs with Semantic Hash Partitioning

Storage Balancing in P2P Based Distributed RDF Data Stores

Evaluating SPARQL Queries on Massive RDF Datasets

Adaptive Partitioning for Very Large RDF Data

Accessing XML Documents Using Semantic Meta Data in a P2P Environment

عنوان ژورنال:

اشتراک گذاری